NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

DYAD: Locality-aware Data Management for accelerating Deep Learning Training

https://doi.org/10.1109/SBAC-PAD63648.2024.00010

Devarajan, Hariharan; Lumsden, Ian; Wang, Chen; Georgouli, Konstantia; Scogland, Tom; Yeom, Jae-Seung; Taufer, Michela (November 2024, IEEE)

Full Text Available
RAJA Performance Suite: Performance Portability Analysis with Caliper and Thicket

https://doi.org/10.1109/SCW63240.2024.00162

Pearce, Olga; Burmark, Jason; Hornung, Rich; Bogale, Befikir; Lumsden, Ian; McKinsey, Michael; Yokelson, Dewi; Boehme, David; Brink, Stephanie; Taufer, Michela; et al (November 2024, IEEE)

Full Text Available
xAMM: “Attention” to Details Improves Cross-Platform Prediction Accuracy

https://doi.org/10.1109/CCGRID64434.2025.00067

Dhakal, Aakash Raj; Islam, Tanzima Z; Dey, Arunavo; Nichols, Daniel; Bhatele, Abhinav; Patki, Tapasya; Scogland, Tom; Yeom, Jae-Seung (May 2025, IEEE)

Full Text Available
OpenMP Kernel Language Extensions for Performance Portable GPU Codes

Tian, Shilei; Scogland, Tom; Chapman, Barbara; Doerfert, Johannes (November 2023, Association for Computing Machinery)
Badia, Rosa M; Mohror, Kathryn (Ed.)
In contemporary high-performance computing architectures, the integration of GPU accelerators has become increasingly prevalent. To harness the full potential of these accelerators, developers often resort to vendor-specific kernel languages, such as CUDA. While this approach ensures optimal efficiency, it inherently compromises portability and engenders vendor dependency. Existing portable programming models, such as OpenMP, while promising, demand extensive code rewriting due to their foundamental difference from kernel languages. In this work, we introduce extensions to LLVM OpenMP, transforming it into a versatile and performance portable kernel language for GPU programming. These extensions allow for the seamless porting of programs from kernel languages to high-performance OpenMP GPU programs with minimal modifications. To evaluate our extension, we implemented a proof-of-concept prototype that contains a subset of extensions we proposed. We ported six established CUDA proxy and benchmark applications and evaluated their performance on both AMD and NVIDIA platforms. By comparing with native versions (HIP and CUDA), our results show that OpenMP, augmented with our extensions, can not only match but also in some cases exceed the performance of kernel languages, thereby offering performance portability with minimal effort from application developers.
more » « less
Full Text Available
Optimizing I/O for an Exascale Implicit Kinetic Plasma Simulation using the Rabbit Storage System

https://doi.org/10.1109/CLUSTERWorkshops65972.2025.11164212

Lumsden, Ian; Devarajan, Hariharan; Yildirim, Izzet; Markidis, Stefano; Hu, Andong; Peng, Ivy; Pennati, Luca; Yokelson, Dewi; Brink, Stephanie; Pearce, Olga; et al (September 2025, IEEE)

Full Text Available

Search for: All records